Diversity of fish, vulnerable species, and species of Outstanding Universal Value in Marine World Heritage Sites

This notebook explores species diversity in Marine World Heritage Sites using OBIS data. This is work in progress.

How to use this notebook

Optionally remove the following cache files before running the notebook:

  • occurrence.Rdata: occurrences by site
  • bold.Rdata: statistics on barcode sequences in BOLD for all fish species

If necessary, adjust fish taxonomy below:

fish_classes <- c("Actinopteri", "Cephalaspidomorphi", "Myxini", "Petromyzonti", "Elasmobranchii", "Holocephali", "Coelacanthi")
fish_orders <- c("Cetomimiformes", "Gasterosteiformes", "Scorpaeniformes", "Stephanoberyciformes")

Other resources

A separate repository has been set up for compiling species list based on existing publications, see https://github.com/iobis/mwhs-species-lists.

Dependencies

library(dplyr)
library(caspr)
library(rredlist)
library(knitr)
library(ggplot2)
library(sf)
library(mapview)
library(concaveman)
library(sfheaders)
library(purrr)
library(robis)
library(stringr)
library(ftplottools)
library(ggrepel)

Compiling a list of all WoRMS species including IUCN Red List category and BOLD barcode statistics

First read all accepted WoRMS species. As it’s not possible to get a full species list from the WoRMS or GBIF web services, I’m reading directly from an export provided by the WoRMS team. The list of fish classes and orders above is used to determine if species are fish or not.

worms <- read.csv("data/taxon.txt", sep = "\t", quote = "") %>%
  as_tibble() %>%
  filter(taxonRank == "Species" & taxonomicStatus == "accepted") %>%
  select(taxonID, scientificName, phylum, class, order) %>%
  distinct() %>%
  mutate(is_fish = class %in% fish_classes | order %in% fish_orders) %>%
  mutate(aphiaID = as.integer(str_replace(taxonID, "urn:lsid:marinespecies.org:taxname:", "")))

Assign Red List categories

Here I’m using the rredlist package to get all Red List species from IUCN. Only the extinct and threatened categories are kept: CR, EN, VU, EW, EX.

get_redlist_species <- function() {
  redlist <- tibble()
  page <- 0
  while (TRUE) {
    res <- rl_sp(page, key = "a936c4f78881e79a326e73c4f97f34a6e7d8f9f9e84342bff73c3ceda14992b9")$result
    if (length(res) == 0) {
      break
    }
    redlist <- bind_rows(redlist, res)
    page <- page + 1
  }
  redlist <- redlist %>%
    as_tibble() %>%
    filter(category %in% c("CR", "EN", "VU", "EW", "EX")) %>%
    mutate(category = factor(category, levels = c("EX", "EW", "CR", "EN", "VU"))) %>%
    group_by(scientific_name) %>%
    filter(row_number() == 1) %>%
    ungroup()
  return(redlist)  
}

redlist <- get_redlist_species()

Now we can label Red List species in the worms data frame.

worms <- worms %>%
  left_join(redlist %>% select(scientific_name, category), by = c("scientificName" = "scientific_name"))

Get BOLD barcode statistics for all fish species

Let’s check BOLD for barcode sequences using all fish species in WoRMS:

# fixes issue with bold package
invisible(Sys.setlocale("LC_ALL", "C"))

if (!file.exists("bold.Rdata")) {
  fish_species <- worms %>% pull(scientificName) %>% unique()
  bold_list <- sapply(fish_species, function(x) NULL)
  for (i in 1:length(bold_list)) {
    message(i, " ", names(bold_list)[i])
    if (is.null(bold_list[[i]])) {
      bold_list[[i]] <- tryCatch({
        caspr::bold_statistics(names(bold_list)[i])
      }, warning = function(warning_condition) {
      }, error = function(error_condition) {
      }, finally = {
      })
    }
  }
  save(bold_list, file = "bold.Rdata")
} else {
  load("bold.Rdata")
}

sequence_numbers <- unlist(map(bold_list, nrow))
fish_sequences <- tibble(species = names(sequence_numbers), sequences = sequence_numbers) %>%
  filter(sequences > 0)

worms <- worms %>%
  left_join(fish_sequences, by = c("scientificName" = "species"))

Statistics

Now calculate some statistics:

worms %>%
  summarize(
    species = n(),
    fish = sum(is_fish),
    vulnerable = length(na.omit(category)),
    vulnerable_fish = sum(is_fish * !is.na(category)),
    barcode_fish = sum(sequences > 0, na.rm = T)
  ) %>%
  kable(format.args = list(big.mark = ","))
species fish vulnerable vulnerable_fish barcode_fish
222,644 20,122 1,471 855 9,534

Note that these numbers are slightly inflated due to homonyms, even after removing unaccepted names.

ggplot(worms %>% filter(!is.na(category))) +
  geom_bar(aes(x = phylum, fill = category)) +
  coord_flip() +
  scale_fill_viridis_d(direction = -1)

Marine World Heritage Sites statistics

In this section we will look at the diversity of fish and vulnerable species in each marine World Heritage site.

Fetch spatial data

First fetch the Marine World Heritage Sites shapefile from MarineRegions:

if (!file.exists("data/shape/worldheritagemarineprogramme.shp")) {
  download.file("http://geo.vliz.be/geoserver/wfs?request=getfeature&service=wfs&version=1.0.0&typename=MarineRegions:worldheritagemarineprogramme&outputformat=SHAPE-ZIP", "data/shape.zip")
  unzip("data/shape.zip", exdir = "shape")
}

shapes <- st_read("data/shape/worldheritagemarineprogramme.shp") %>%
  select(full_name, country, geometry) %>%
  mutate(full_name = iconv(full_name, "latin1", "UTF-8"))
## Reading layer `worldheritagemarineprogramme' from data source 
##   `/Users/pieter/Dropbox (IPOfI)/werk/projects/MWHS/notebook-mwhs/data/shape/worldheritagemarineprogramme.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 129 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -55.00039 xmax: 180 ymax: 71.80583
## Geodetic CRS:  WGS 84

Let’s create a map.

mapviewOptions(fgb = FALSE)
mapview(shapes)@map

Process spatial data

Reduce the complexity of the spatial geometries by removing holes, then merge geometries by site.

shapes_processed <- shapes %>%
  mutate(geometry = st_make_valid(sf_remove_holes(geometry))) %>%
  group_by(full_name) %>%
  summarize(geometry = st_union(geometry))

mapview(shapes, color = "red", lwd = 0.5, alpha = 1, alpha.regions = 0) +
  mapview(shapes_processed, color = "green", lwd = 0.5, alpha = 1, alpha.regions = 0, legend = FALSE)

Fetch occurrence data

Now retrieve data from OBIS by area. OBIS is queried by bounding box, but a point-in-polygon calculation is used to discard points outside the areas.

occ_for_geom <- function(geom) {
  wkt <- st_as_text(st_as_sfc(st_bbox(geom)), digits = 6)
  message(wkt)
  occ <- occurrence(geometry = wkt, fields = c("decimalLongitude", "decimalLatitude", "date_year", "scientificName", "aphiaID", "dataset_id")) %>%
    st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)
  occ_filtered <- occ %>%
    filter(st_intersects(geometry, geom, sparse = FALSE)) %>%
    as_tibble() %>%
    select(-geometry)
  return(occ_filtered)
}

if (!file.exists("occurrence.Rdata")) {
  occs <- map(shapes_processed$geometry, occ_for_geom)
  for (i in 1:nrow(shapes_processed)) {
    occs[[i]]$full_name <- shapes_processed$full_name[i]
  }
  occ <- bind_rows(occs)
  save(occ, file = "occurrence.Rdata")
} else {
  load("occurrence.Rdata")
}

occ <- occ %>%
  inner_join(worms %>% select(is_fish, aphiaID, category, sequences), by = "aphiaID") %>%
  mutate(
    aphia_fish = ifelse(is_fish == TRUE, aphiaID, NA),
    aphia_vulnerable = ifelse(!is.na(category), aphiaID, NA),
    aphia_vulnerable_fish = ifelse(is_fish == TRUE & !is.na(category), aphiaID, NA),
    aphia_barcode_fish = ifelse(is_fish == TRUE & sequences > 0, aphiaID, NA),
  ) 

Calculate statistics

occ %>%
  summarize(
    records = n(),
    species = length(unique(aphiaID)),
    fish = length(unique(aphia_fish)),
    vulnerable = length(unique(aphia_vulnerable)),
    vulnerable_fish = length(unique(aphia_vulnerable_fish)),
    barcode_fish =  length(unique(aphia_barcode_fish))
  ) %>%
  kable(format.args = list(big.mark = ","))
records species fish vulnerable vulnerable_fish barcode_fish
2,656,222 25,579 6,232 472 269 4,537
site_stats <- occ %>%
  group_by(full_name) %>%
  summarize(
    records = n(),
    species = length(unique(aphiaID)),
    fish = length(unique(aphia_fish)),
    vulnerable = length(unique(aphia_vulnerable)),
    vulnerable_fish = length(unique(aphia_vulnerable_fish)),
    barcode_fish =  length(unique(aphia_barcode_fish))
  )

xlsx::write.xlsx(site_stats, file = "output/sites.xlsx")

site_stats %>%
  kable(format.args = list(big.mark = ","))
full_name records species fish vulnerable vulnerable_fish barcode_fish
Aldabra Atoll 3,184 1,040 501 65 7 456
Area de Conservaci<U+00F3>n Guanacaste 941 310 307 12 9 212
Banc d’Arguin National Park 113 41 23 5 4 12
Belize Barrier Reef Reserve System 8,365 1,067 349 33 27 318
Brazilian Atlantic Islands: Fernando de Noronha and Atol das Rocas Reserves 920 268 123 11 10 108
Cocos Island National Park 2,301 492 410 50 48 311
Coiba National Park and its Special Zone of Marine Protection 2,380 489 446 31 27 305
East Rennell 13 5 2 1 1 1
Everglades National Park 58,637 831 227 21 14 193
Gal<U+00E1>pagos Islands 57,392 1,524 728 86 65 516
Gough and Inaccessible Islands 2,482 195 45 12 5 38
Great Barrier Reef 1,229,612 12,702 2,862 220 90 2,304
Gulf of Porto: Calanche of Piana, Gulf of Girolata, Scandola Reserve 76 16 2 2 1 2
Ha Long Bay 2,011 323 320 8 8 292
Heard and McDonald Islands 38,724 256 26 10 1 23
High Coast / Kvarken Archipelago 16,345 128 19 2 2 18
Ibiza, Biodiversity and Culture 53 29 14 3 2 14
Islands and Protected Areas of the Gulf of California 10,665 794 611 52 34 403
Kluane / Wrangell-St Elias / Glacier Bay / Tatshenshini-Alsek 505 97 73 8 2 69
Komodo National Park 563 334 216 11 10 173
Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems 30,840 4,560 886 42 15 715
Macquarie Island 239,191 628 91 15 1 80
Malpelo Fauna and Flora Sanctuary 37,487 1,013 519 67 55 364
Natural System of Wrangel Island Reserve 157 55 11 6 1 11
New Zealand Sub-Antarctic Islands 15,500 1,350 121 30 10 106
Ningaloo Coast 255,421 3,312 931 102 50 819
Ogasawara Islands 244 149 122 6 5 111
Papahanaumokuakea 280,746 1,669 572 35 16 454
Pen<U+00ED>nsula Vald<U+00E9>s 2,860 193 5 6 1 3
Phoenix Islands Protected Area 2,555 490 364 9 5 319
Puerto-Princesa Subterranean River National Park 112 99 28 2 2 27
Rock Islands Southern Lagoon 4,745 892 674 19 13 597
Shark Bay, Western Australia 9,750 1,312 532 40 15 439
Shiretoko 189 41 41 2 1 37
Sian Ka’an 441 109 74 6 4 71
Socotra Archipelago 3,443 529 69 20 2 61
St Kilda 3,187 104 17 9 3 17
Surtsey 7 3 1 3 1 1
The Wadden Sea 318,496 791 100 17 11 97
Tubbataha Reefs Natural Park 165 31 9 8 4 9
Ujung Kulon National Park 186 42 2 8 1 2
West Norwegian Fjords ? Geirangerfjord and N<U+00E6>r<U+00F8>yfjord 10 6 1 2 1 1
Whale Sanctuary of El Vizcaino 683 114 108 11 10 88
iSimangaliso Wetland Park 14,525 2,024 954 80 62 843
subset <- site_stats %>%
  filter(full_name %in% c("Great Barrier Reef", "Belize Barrier Reef Reserve System", "Ha Long Bay", "The Wadden Sea", "Heard and McDonald Islands", "Aldabra Atoll")) %>%
  arrange(full_name)

subset$x <- c(600, 100, 800, 500, 10, 60)
subset$y <- c(140, 80, 210, 3, 20, 40)

ggplot() +
  geom_segment(subset, mapping = aes(x = x, y = y, xend = fish, yend = vulnerable)) +
  geom_label(subset, mapping = aes(x = x, y = y, label = full_name), size = 3.5, label.size = 0) +
  geom_point(site_stats, mapping = aes(x = fish, y = vulnerable, size = species, color = species), shape = 21, stroke = 1.2, fill = "white") +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  scale_size(range = c(1, 8), breaks = c(1000, 5000, 10000)) +
  scale_color_viridis_c(end = 0.8, trans = "log10", breaks = c(1000, 5000, 10000)) +
  guides(color = guide_legend(), size = guide_legend()) +
  xlab("fish species") + ylab("vulnerable species") +
  ft_theme() +
  ggtitle("Fish and vulnerable species diversity at marine World Heritate sites")

ggsave("output/sites.png", width = 10, height = 6, dpi = 600)

Species lists by site

species_lists <- occ %>%
  group_by(full_name, scientificName, is_fish, category, sequences) %>%
  summarize() %>%
  select(site = full_name, scientificName, is_fish, iucn_category = category, bold_sequences = sequences) %>%
  arrange(site, scientificName) %>%
  ungroup()

xlsx::write.xlsx(as.data.frame(species_lists), file = "output/species_lists.xlsx", showNA = FALSE, row.names = FALSE)
xlsx::write.xlsx(as.data.frame(species_lists %>% filter(is_fish)), file = "output/species_lists_fish.xlsx", showNA = FALSE, row.names = FALSE)